Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core)!: implement write returns metadata #5562

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

meteorgan
Copy link
Contributor

@meteorgan meteorgan commented Jan 19, 2025

Which issue does this PR close?

Part of #5557

Rationale for this change

What changes are included in this PR?

  • implement write returns metadata for Operator and BlockingOperator
  • enable write_has_content_length for all services and implement the logic for them
  • implement write returns metadata logic for service S3, fs, monoiofs, hdfs and webhdfs
  • other services return default Metadata after writing
  • skip return value for language bindings

Are there any user-facing changes?

Yes.

  • write, write_with in Operator returns Result<Metadata> instead of Return<()>
  • Writer.close() returns Return<Metadata> instead of Return<()>
  • write in BlockingOperator returns Return<Metadata> instead of Return<()>
  • FunctionWrite.call() used by BlockingOperator returns Return<Metadata> instead of Return<()>

@meteorgan
Copy link
Contributor Author

emmm, another issue related to ceph rados: the Etag returned by CompleteMultipartUpload doesn't follow the standard.
image

@Xuanwo
Copy link
Member

Xuanwo commented Jan 20, 2025

the Etag returned by CompleteMultipartUpload doesn't follow the standard.

Well, well, well...

Let's do our best to permit this since it's not harmful. Technically, we don't allow users to parse the ETag value; instead, they should store it and compare it with values from the same source. I'm guessing the issue lies in part of our code assuming that the ETag must contain "?

@meteorgan
Copy link
Contributor Author

the Etag returned by CompleteMultipartUpload doesn't follow the standard.

Well, well, well...

Let's do our best to permit this since it's not harmful. Technically, we don't allow users to parse the ETag value; instead, they should store it and compare it with values from the same source. I'm guessing the issue lies in part of our code assuming that the ETag must contain "?

No. The issue is that I compare the Etag from stat with the one from write in the tests to ensure we get the correct Etag if it's returned by write. However, the value returned from stat contain " while the value returned from write does not. they aren't consistent with each other, even within the same service !

@Xuanwo
Copy link
Member

Xuanwo commented Jan 20, 2025

No. The issue is that I compare the Etag from stat with the one from write in the tests to ensure we get the correct Etag if it's returned by write. However, the value returned from stat contain " while the value returned from write does not. they aren't consistent with each other, even within the same service !

That's really surprising. Let's disable the Ceph RADOS test and create an issue to track it. I believe such a bug isn't worth our workaround.

@meteorgan
Copy link
Contributor Author

That's really surprising. Let's disable the Ceph RADOS test and create an issue to track it. I believe such a bug isn't worth our workaround.

I've done some tests. it's very interesting that stat_with().if_match(etag) works fine regardless of whether the ETag includes the prefix and suffix " or not.

image

@meteorgan
Copy link
Contributor Author

I've done some tests. it's very interesting that stat_with().if_match(etag) works fine regardless of whether the ETag includes the prefix and suffix " or not.

The results are the same across Ceph RADOS, Minio and S3

@meteorgan meteorgan force-pushed the write_returns_metadata_for_s3 branch from 81aeb13 to ddc3de5 Compare January 20, 2025 13:47
@meteorgan
Copy link
Contributor Author

That's really surprising. Let's disable the Ceph RADOS test and create an issue to track it. I believe such a bug isn't worth our workaround.

I still created a workaround by adding prefix and suffix " for Etag in the test if they are not included. I think this approach is simpler, what do you think ?

@meteorgan meteorgan marked this pull request as ready for review January 20, 2025 14:03
core/src/raw/adapters/kv/backend.rs Outdated Show resolved Hide resolved
core/src/raw/adapters/kv/backend.rs Outdated Show resolved Hide resolved
core/src/raw/adapters/typed_kv/backend.rs Outdated Show resolved Hide resolved
core/src/raw/adapters/typed_kv/backend.rs Outdated Show resolved Hide resolved
core/src/raw/oio/write/append_write.rs Outdated Show resolved Hide resolved
core/src/services/aliyun_drive/writer.rs Outdated Show resolved Hide resolved
core/src/services/s3/writer.rs Show resolved Hide resolved
core/tests/behavior/async_write.rs Outdated Show resolved Hide resolved
core/tests/behavior/async_write.rs Outdated Show resolved Hide resolved
@Xuanwo
Copy link
Member

Xuanwo commented Jan 20, 2025

Perhaps we should introduce write_has_content_type flags to indicate which metadata is available in the meta returned by the write operation. Checking the meta.content_type() may pass the test but is meaningless to users; it could also potentially conceal incorrect implementations that developers can simplely return Metadata::default() to pass all tests. And it makes it more difficult for us to determine the implementation status of various services.

@meteorgan
Copy link
Contributor Author

meteorgan commented Jan 21, 2025

Perhaps we should introduce write_has_content_type flags to indicate which metadata is available in the meta returned by the write operation.

Would write_returns_content_type be a better name?

Checking the meta.content_type() may pass the test but is meaningless to users; it could also potentially conceal incorrect implementations that developers can simplely return Metadata::default() to pass all tests.

if a service returns Metadata::default(), it indicates that it implements nothing. There's a chance that a developer intended to return some contents but forgot set the fields. Unfortunately, the tests can't catch this.

And it makes it more difficult for us to determine the implementation status of various services.

Make sense ! Making it explicit is better.

@meteorgan
Copy link
Contributor Author

No. The issue is that I compare the Etag from stat with the one from write in the tests to ensure we get the correct Etag if it's returned by write. However, the value returned from stat contain " while the value returned from write does not. they aren't consistent with each other, even within the same service !

Wow, they've already had a PR for this issue: ceph/ceph#60477

@meteorgan
Copy link
Contributor Author

Gently push. Hi, @Xuanwo I've provided some replies, do you have any suggestions ?

@Xuanwo
Copy link
Member

Xuanwo commented Jan 24, 2025

Would write_returns_content_type be a better name?

I prefer to align with exists stat_has_xxx and list_has_xxx.

Wow, they've already had a PR for this issue: ceph/ceph#60477

That's nice, maybe we can disable ceph test for now...

@meteorgan meteorgan force-pushed the write_returns_metadata_for_s3 branch from ddc3de5 to a1e0b0a Compare January 26, 2025 12:53
@meteorgan meteorgan changed the title feat(core)!: implement write returns metadata for s3 feat(core)!: implement write returns metadata Jan 26, 2025
@meteorgan meteorgan force-pushed the write_returns_metadata_for_s3 branch from 9482e08 to d126087 Compare January 27, 2025 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants